Skip to content

MINOR: Fix IOOBE in ListVector/LargeListVector.getFieldBuffers() for never-allocated offset buffer#27

Open
prashanthbdremio wants to merge 11 commits into
dremio_27.0_23_19from
fix/list-vector-offset-buffer-ioobe
Open

MINOR: Fix IOOBE in ListVector/LargeListVector.getFieldBuffers() for never-allocated offset buffer#27
prashanthbdremio wants to merge 11 commits into
dremio_27.0_23_19from
fix/list-vector-offset-buffer-ioobe

Conversation

@prashanthbdremio

Copy link
Copy Markdown

Summary

  • Adds realloc guard to ListVector.setReaderAndWriterIndex() and LargeListVector.setReaderAndWriterIndex() to grow the offset buffer before setting writerIndex, mirroring what BaseVariableWidthVector already does
  • Fixes IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4 (expected: 0 <= readerIndex <= writerIndex <= capacity(0)) when serializing empty ListVectors through Netty

Root cause

apacheGH-343 (Jan 2026) changed setReaderAndWriterIndex() to unconditionally set offsetBuffer.writerIndex((valueCount + 1) * OFFSET_WIDTH) so that even empty lists export the trailing-zero offset required by the Arrow IPC spec. However, unlike the analogous fix in BaseVariableWidthVector (lines 402-411), this change did not include a realloc guard.

When the offset buffer has never been allocated (capacity == 0) — which is the case for a freshly-constructed ListVector via ListVector.empty() or allocator.getEmpty() — the writerIndex is set to 4 (or 8 for LargeListVector) on a zero-capacity buffer. This is an inconsistent state that crashes NettyArrowBuf.unwrapBuffer() because Netty's AbstractByteBuf.writerIndex() bounds-checks against capacity.

ListVector.exportCDataBuffers() (line 252) already handles the capacity == 0 case correctly — setReaderAndWriterIndex() was the only path missing the guard.

Production impact

This caused SYSTEM ERROR: IndexOutOfBoundsException in Dremio Cloud DC-231 (Arrow Java 19.0.0) during REFRESH REFLECTION jobs on tables with FLATTEN(CONVERT_FROM(..., 'json')) producing empty inner lists. DC-229 (Arrow Java 18.3.0) was not affected because the pre-apacheGH-343 code set offsetBuffer.writerIndex(0) for valueCount == 0.

Test plan

  • Dremio unit test TestFragmentWritableBatch.emptyListVectorOffsetBufferIsInconsistentAfterUnload confirms the Arrow regression
  • Dremio unit test TestFragmentWritableBatch.fragmentWritableBatchCreateSucceedsForEmptyListVector confirms the fix
  • Both tests verified on DC-231 branch (Arrow 19)

🤖 Generated with Claude Code

apacheGH-343 (Jan 2026) changed setReaderAndWriterIndex() to unconditionally
set offsetBuffer.writerIndex((valueCount + 1) * OFFSET_WIDTH) so that
even empty lists export the trailing-zero offset required by the Arrow
IPC spec. However, unlike the analogous fix in BaseVariableWidthVector,
this change did not include a realloc guard — when the offset buffer has
never been allocated (capacity == 0), the writerIndex is set to 4 on a
zero-capacity buffer, producing the inconsistent state:

  readerIndex: 0, writerIndex: 4, capacity(0)

Netty's AbstractByteBuf.writerIndex() bounds-checks against capacity and
throws IndexOutOfBoundsException. This crashes Dremio's
FragmentWritableBatch serialization path (SingleSenderOperator,
PartitionSenderOperator, etc.) when an empty ListVector reaches the
VectorUnloader -> NettyArrowBuf.unwrapBuffer() chain.

The fix adds the same realloc guard that BaseVariableWidthVector already
uses: grow the offset buffer to the required size before setting
writerIndex. This mirrors what exportCDataBuffers() already does for the
capacity == 0 case.

Affects both ListVector (OFFSET_WIDTH = 4) and LargeListVector
(OFFSET_WIDTH = 8).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change-Id: Ib5ca53064ea5bc69c43fd6cc09692b07199c586c
prashanthbdremio and others added 5 commits June 18, 2026 13:27
Tests exercise the exact bug path: setValueCount(0) on a vector where
allocateNew() was never called, leaving the offset buffer at capacity 0.

- testEmptyListOffsetBufferWithoutAllocate: verifies getFieldBuffers()
  produces a valid offset buffer after the realloc guard fires
- testEmptyListGetBuffersWithoutAllocate: exercises getBuffers(false),
  the IPC serialization entry point that produced the original Netty
  IOOBE via VectorUnloader -> NettyArrowBuf.unwrapBuffer()
- Analogous tests for LargeListVector

The existing testEmptyListOffsetBuffer / testEmptyLargeListOffsetBuffer
tests call allocateNew() before setValueCount(0), so the offset buffer
always has nonzero capacity and the realloc guard is never entered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change-Id: I1972ad3d80f405ed9b41103e88902571b63defb0
The previous guard (capacity < required) could replace an already-allocated
offset buffer with a smaller one when setReaderAndWriterIndex() was called
before the vector was populated (e.g., during validateFull() on an empty
vector). The replacement buffer was too small for subsequent writes.

Narrowing to capacity==0 targets only the never-allocated empty singleton
from allocator.getEmpty(), which is the exact state that causes the IOOBE.
Buffers with capacity>0 were properly allocated and should not be replaced.

Also removes the dead copy branch (capacity>0 is always false when we enter
the guard).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change-Id: I28ee94d2a2b431f7699ce8c3ce133f184ede5458
The previous approach mutated this.offsetBuffer inside
setReaderAndWriterIndex(), which is called from getFieldBuffers(),
getBuffers(), and indirectly from validation. Replacing the offset buffer
during validation broke subsequent writes — tests that called
validateFull() on an empty vector before populating data got a 4-byte
buffer that was too small for the actual data.

The fix now mirrors exportCDataBuffers() exactly: getFieldBuffers()
detects the inconsistent state (capacity==0 but writerIndex>0, produced
by setReaderAndWriterIndex on a never-allocated buffer) and substitutes a
properly-sized temporary buffer for serialization. The vector's own
offsetBuffer is never mutated, so subsequent allocateNew/setValueCount
calls work normally.

setReaderAndWriterIndex() is reverted to upstream Arrow 19 behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change-Id: I696163fac4fcf5ada1af0edb5da2f4e8ba4e3e84
Tests now check the buffer RETURNED by getFieldBuffers() (which is a
temp allocation when the vector's offset buffer has capacity 0) instead
of checking the vector's own getOffsetBuffer() which is intentionally
not mutated.

Remove testEmptyListGetBuffersWithoutAllocate / LargeList variant —
getBuffers(false) returns an empty array for valueCount==0 (via the
getBufferSize()==0 guard), so the offset buffer is never exposed through
that path and doesn't need testing.

Release the temp buffer after assertions to avoid memory leak detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change-Id: I73877ba98047b6bda6fcf7737e53778deaa7cb74
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change-Id: I47527f1d5934a122c7c7381c037c4f951b672592
@prashanthbdremio prashanthbdremio changed the title Fix IOOBE in ListVector/LargeListVector.setReaderAndWriterIndex() MINOR: Fix IOOBE in ListVector/LargeListVector.getFieldBuffers() for never-allocated offset buffer Jun 18, 2026
@github-actions

Copy link
Copy Markdown

Thank you for opening a pull request!

Please label the PR with one or more of:

  • bug-fix
  • chore
  • dependencies
  • documentation
  • enhancement

Also, add the 'breaking-change' label if appropriate.

See CONTRIBUTING.md for details.

The previous approach returned a temp buffer that was not owned by the
vector. VectorUnloader.retain() added a ref but the original ref was
never released, leaking OFFSET_WIDTH bytes per empty list serialized.

Now getFieldBuffers() assigns the new buffer to this.offsetBuffer so
the vector owns it. Uses allocateOffsetBuffer(offsetAllocationSizeInBytes)
to get a full-sized buffer (no side effect since the argument matches the
field value), which is large enough for subsequent writes if the vector
is populated after validation — fixing the TestValidateVector regression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change-Id: I4d5caed0bf2d323215556d6a1a90330b4ab399e8
Comment thread vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java Outdated
prashanthbdremio and others added 2 commits June 18, 2026 16:11
…Bytes

Addresses review feedback from bodduv: LargeListVector.clear() sets
offsetAllocationSizeInBytes = offsetBuffer.capacity(), which is 0 after
the buffer is released. Using offsetAllocationSizeInBytes would allocate
a 0-byte buffer in that case.

(valueCount + 1) * OFFSET_WIDTH is always the correct minimum size
(at least OFFSET_WIDTH when valueCount == 0) and matches the writerIndex
that setReaderAndWriterIndex() just set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change-Id: Iae6b0a2578deb0f7fd8ca7ae36a0a7e3fc595765
…ression

(valueCount + 1) * OFFSET_WIDTH produces a 4-byte buffer for valueCount=0,
which is too small when validateFull() triggers getFieldBuffers() before
the vector is populated — subsequent setVector() writes at index 4+ fail.

Use max(required, INITIAL_VALUE_ALLOCATION * OFFSET_WIDTH) to match the
default allocateNew() size. Use allocator.buffer() directly instead of
allocateOffsetBuffer() to avoid mutating offsetAllocationSizeInBytes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change-Id: Iccb00d4775b196f7978b34cbc98234f50ede370a
List<ArrowBuf> result = new ArrayList<>(2);
setReaderAndWriterIndex();
result.add(validityBuffer);
if (offsetBuffer.capacity() == 0 && offsetBuffer.writerIndex() > 0) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only further tightening I could think of is to check offsetBuffer.capacity() < OFFSET_WIDTH, but I think the main issue was indeed with offsetBuffer.capacity() == 0.

@selvaganesang selvaganesang Jun 18, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are dealing with emptyVector issue, wouldn't be better to check if valueCount == 0 here. This seems to be fixing the unrelated issue where offset.capacity == 0 but with writerIndex was set incorrectly somewhere else.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2cd5472. The guard is now scoped to valueCount == 0, so this only handles the empty-vector serialization case. For non-empty vectors with a zero-capacity offset buffer we no longer synthesize zero offsets, since that would be an ambiguous/corrupt state rather than this bug.

@bodduv bodduv Jun 18, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach may not fix the issue if writeIndex was set incorrectly.
🤔 were there profiles of failed queries other than capacity = 0, writerIndex = 4?

Edit:
If writerIndex is set incorrectly and capacity is not zero (but say lesser than writerIndex), then I am not sure what else we could do here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Then it is a different issue in Apache Arrow. That I believe is beyond the scope of this fix

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the customer profiles, I see this signature..
IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4
Have to check in observe if all the profiles have it.. I checked in 10 profiles which Dmitry had given us

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this route is not the right way to fix.. there are more and more edge cases that we might hit.

List<ArrowBuf> result = new ArrayList<>(2);
setReaderAndWriterIndex();
result.add(validityBuffer);
if (offsetBuffer.capacity() == 0 && offsetBuffer.writerIndex() > 0) {

@selvaganesang selvaganesang Jun 18, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are dealing with emptyVector issue, wouldn't be better to check if valueCount == 0 here. This seems to be fixing the unrelated issue where offset.capacity == 0 but with writerIndex was set incorrectly somewhere else.

List<ArrowBuf> result = new ArrayList<>(2);
setReaderAndWriterIndex();
result.add(validityBuffer);
if (offsetBuffer.capacity() == 0 && offsetBuffer.writerIndex() > 0) {

@selvaganesang selvaganesang Jun 18, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. If it is not emptyVector issue, we may not know what should be the value in the offsetBuffer. Is it ok to be 0 in all indexes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2cd5472. Same tightening applied here: valueCount == 0 && offsetBuffer.capacity() == 0 && offsetBuffer.writerIndex() > 0. That keeps the fix limited to the empty-vector trailing-offset case, where zero is the correct offset value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants